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(54) Method and apparatus for motion field estimation, segmentation and coding 



(57) The present invention discloses an innp roved 
nnethod and apparatus for prediction coding with motion 
estimation, in which the frame-to-frame changes result- 
ing from motion in an image depicted in a video frame 
are detected and coded for transmission to a video re- 
ceiver. The motion estimation technique of the present 
invention uses a hierarchical approach in which a motion 
vector updating routine is performed with respect to mul- 
tiple levels of smaller and smaller regions of a frame. The 
motion vector updating routine updates the motion vec- 
tor of a smaller region by assigning to it a best motion 
vector selected from among an initial motion vector as- 
signed to the smaller region, motion vectors of neighbor- 
ing regions, and a matched motion vector obtained by 
performing a block matching technique for the smaller 
region. The best motion vector for each region is select- 
ed according to a priority scheme and a predetermined 
threshold value. Adjacent regions having the same mo- 
tion vector are then merged together, and a region shape 
representation routine is used to specify contour pixels 
that will allow the merged regions to be recovered by a 
decoder. A contou r coding routine Is then used to encode 
the contour pixels for transmission to the decoder. The 
present method requires less information about the mo- 
tion field to be sent to the decoder and results in an im- 
proved prediction error compared to the H.261 standard 
currently used. The present invention is particularly ad- 
vantageous for very low bit-rate applications. 
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Description 

FIELD OF THE INVENTION 

The present invention relates generally to a method 
and apparatus tor prediction coding utilizing information 
that is already present in a video receiver in order to de- 
scribe a current trame with as little information as possi- 
ble. More particularly, the present invention relates to an 
improved method and apparatus for prediction coding 
with motion estimation in which the frame-to-frame 
changes resulting from motion in an image depicted in a 
frame are detected and coded for transmission to a video 
receiver. 

BACKGROUND OF THE INVENTION 

Motion estimation and compensation techniques 
have received increasing attention for the transmission 
and storage of digital image sequences. For some digital 
video applications, high compression ratios have been 
achieved by using motion compensation methods to re- 
duce Inherent temporal pixel redundancies in image se- 
quences. In such techniques, a motion field is estimated 
at an encoder. The motion field relates object locations 
in a previous frame of a sequence to their new locations 
in the current frame. Pixel intensities of the previous and 
current frames are used to compute the estimated mo- 
tion field. This motion field estimate must then be recon- 
structed at a decoder without the benefit of intensities of 
the pixels in the current frame. 

The principle of motion field estimation, which is well 
known in the prior art, may be better understood with re- 
spect to FIG. 1 , which shows a preceding frame and a 
present frame. An object positioned at point A' in the pre- 
ceding frame is moved to point B in the present frame. 
A two dimensional displacement or motion vector, v, is 
calculated from the point A in the preceding frame to 
point B' in the preceding frame, where point B' corre- 
sponds to point B in the current frame. A signal r(r -i- v) 
at point A' instead of a signal r(r) at point B is used as a 
motion compensated prediction signal and is subtracted 
from a signal l(r) at point B so as to obtain a prediction 
error signal l(r) -r(r + v) where r is the position vector 
which indicates a given position on the video screen. In 
motion compensated coding, the prediction error signal 
l(r) - r(r + v) is smaller than the prediction error signal I 
(r) - r(r). The former prediction error signal, therefore, 
can be used effectively to code an image signal with a 
moving object. 

Block-based techniques represent one type of mo- 
tion compensation method which computes motion vec- 
tors at an encoder and transmits them to a decoder 
where the motion field Is constructed. I n block-based vid- 
eo coding techniques, such as the one described in U.S. 
Patent No. 4,307,420, a frame is divided into non-over- 
lapping blocks or regions of N x N pixels. In order to limit 
the amount of information that must be transmitted to the 



decoder, block-based methods assume that blocks of 
pixels move with constant translational motion. A best 
match for each block Is determined In the previously 
transmitted frame, where the criteria is typically the mean 

s absolute difference between the intensities of the two 
blocks. The relative difference in position between the 
current block and the matched block In the previous 
frame is the motion vector. The intensity of the matched 
block is subtracted from the intensity of the current block 

10 in order to obtain the displaced frame difference (DFD). 
The collection of all the motion vectors for a particular 
frame forms a motion field. The motion field and the dis- 
placed frame differences are then transmitted from the 
encoder to the decoder, which predicts the new image 

15 based upon this transmitted information and the previous 
image In the sequence of Images. 

One inherent difficulty in block-matching techniques 
results from the assumption that motion is constant with- 
in any given block. When objects in a particular block 

20 move at different velocities, the motion vector obtained 
may correspond to only one, or possibly even none, of 
the objects in the block. If the size of the blocks Is de- 
creased, then the assumption becomes more valid. The 
overhead of computation and transmission of displace- 

25 ment or motion information, however, increases. 

One method for Improving motion estimation and 
compensation, proposed by M.T. Orchard in "Predictive 
Motion-Field Segmentation For Image Sequence Cod- 
ing," IEEE-Transactions on Circuits and Systems For 

30 Video Technology, Vol. 3 (Feb. 1993), involves seg- 
menting the motion field of frames in a sequence^ and 
using the segmentation to predict the location of mo- 
tion-field discontinuities in the current frame. Motion es- 
timates for each segmented region are chosen from 

35 among the motion vectors of the nearest neighboring re- 
gions based upon the motion vector that minimizes the 
prediction error. A scheme is then presented for predict- 
ing the segmentation at the decoder computed from pre- 
viously decoded frames. 

40 A similar technique of motion estimation and seg- 
mentation is disclosed in Liu et al., "A Simple Method To 
Segment Motion Field For Video Coding," SPIE Visual 
Communications and Image Processing, Vol. 1818, pp. 
542-551 (1992). Motion vectors for blocks of sixteen by 

45 sixteen pixels are first determined by block matching, 
and each block Is then divided into sixteen sub-blocks of 
four by four pixels. A motion vector is chosen for each 
sub-block from among the motion vectors of the larger 
block and neighboring blocks such that the prediction er- 

50 ror Is minimized. 

SUMMARY OF THE INVENTION 

The present Invention discloses an Improved meth- 
55 od and apparatus for prediction coding with motion esti- 
mation, in which the frame-to-frame changes resulting 
from motion in an image depicted in a video frame are 
detected and coded for transmission to a video receiver. 
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The motion estimation technique of the present invention 
uses a hierarchical approach in which a motion vector 
updating routine is performed with respect to multiple 
levels of smaller and smaller regions of a frame. The mo- 
tion vector updating routine updates the motion vector of 
a smaller region by assigning to it a best motion vector 
selected from among an initial motion vector assigned to 
the smaller region, motion vectors of neighboring re- 
gions, and a matched motion vector obtained by per- 
forming a block matching technique for the smaller re- 
gion. The best motion vector for each region is selected 
according to a priority scheme and a predetermined 
threshold value. 

Other features and advantages of the present inven- 
tion will be readily apparent by reference to the following 
detailed description and accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 shows the principle of frame-to-frame motion 
field estimation. 

FIG. 2 is a block diagram of a motion estimation and 
segmentation unit according to the present invention. 

FIGS. 3A - 3C are a flow chart showing the steps of 
motion estimation, segmentation and coding according 
to the method of the present invention. 

FIG. 4 shows an exemplary first level segmentation 
of a video frame. 

FIG. 5 shows an exemplary first level segmentation 
of a video frame indicating the region whose motion vec- 
tor is currently being updated. 

FIG. 6 is a flow chart showing the steps for selecting 
the best motion vector for the region under consideration 
according to the method of the present invention. 

FIG. 7 shows an exemplary second level segmen- 
tation of a video frame. 

FIG. 8 shows an exemplary frame having merged 
regions. 

FIG. 9 shows the contour pixels that result from per- 
forming a region shape representation routine for all the 
pixels in the frame of FIG. 8. 

Fl G . 1 0 is a flow chart showing the steps of a contou r 
coding routine according to the present invention. 

FIGS. 11 A - lie show an exemplary frame during 
various stages of the contour coding routine. 

FIG. 12 is a block diagram of a system incorporating 
motion estimation, segmentation and coding according 
to the present invention. 

DETAILED DESCRIPTION OF THE PRESENT 
INVENTION 

Motion Field Estimation and Segmentation 

FIG. 2 is a block diagram of a motion estimation and 
segmentation unit 200 for determining the motion vector 
of pixels or groups of pixels in a present frame with re- 
spect to pixels or groups of pixels in a preceding frame 



according to the present invention. The unit 200 has sev- 
eral read-write memory units 205, 210, 215, 220 and 
240. The previous decoded frame memory unit 205 has 
sufficient memory for storing a monochrome intensity 
s corresponding to each pixel in the preceding frame. Sim- 
ilarly, the current frame memory unit 210 has sufficient 
memory for storing a monochrome intensity correspond- 
ing to each pixel in the current frame. Each pixel in a 
frame is referenced by horizontal and vertical coordi- 
10 nates, (x,y). For example, the pixel in the upper-left cor- 
ner of the frame would have the coordinates (1,1), and 
the pixel in the lower-right corner of the frame would have 
the coordinates (M, N) in a frame having a total of (M x 
N) pixels. The motion field memory unit 21 5 has sufficient 
15 memory for storing a calculated two-dimensional motion 
vector corresponding to each pixel in the current frame. 

The candidate motion vector memory unit 220 
stores the values of motion vectors which, according to 
the method of the present invention, are candidates for 
20 refining or updating the value of the motion vector corre- 
sponding to a particular region or block of pixels in the 
current frame. In particular, the memory unit 220 has a 
file 221 for storing the value of a motion vector initially 
assigned to a region whose motion vector is being up- 
25 dated. The memory unit 220 also has eight files 222-229 
for storing motion vectors assigned to regions neighbor- 
ing the region whose motion vector is being updated, as 
explained further below. The memory unit 220 is con- 
nected to the motion field memory unit 215 so that it can 
30 receive the value of the motion vector initially assigned 
to the region whose motion vector is being updated as 
well as the motion vectors of neighboring regions. Final- 
ly, the memory unit 220 has a file 230 for storing a motion 
vector calculated by a motion refinement unit 260. 
55 The motion refinement unit 260 receives data from 
the previous decoded frame memory unit 205, the cur- 
rent frame memory unit 21 0 and the motion field memory 
unit 215. The motion refinement unit 260 may be any 
suitable device or system in the prior art that performs a 
40 motion estimation technique which attempts to improve 
the motion vector initially assigned to the region whose 
motion vector is being updated by minimizing the predic- 
tion or matching error as defined below. Such motion es- 
timation techniques are well known in the art and include, 
45 tor example, block matching methods and their imple- 
mentations, such as the one described in U.S. Patent 
No. 4,307,420. The output of the motion refinement unit 
260 provides a matched motion vector which is a two 
dimensional motion vector relating a region in the current 
50 frame to a region in the previous frame. 

The matching error memory unit 240 stores match- 
ing errors corresponding to each of the motion vectors 
stored in the memory unit 220. The matching errors are 
computed in a matching error computing unit 255 which 
55 may be implemented in hardware or software. In general, 
the matching error is an indication of the error in predic- 
tion which results from using the monochrome intensity 
b(z-D, t-f) of the previous frame and a particular two di- 
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mensional motion vector, D, to predict the monochrome 
intensity b(z,t) of the current frame, where z is the two 
dimensional vector of spatial position and t' is the time 
inten/al between the two frames. The matching error or 
prediction error may be defined as the summation over 
all positions within a region under consideration, i: N (b 
(z,t) - b(z-D, t-t')), where N is a distance metric such as 
the magnitude or the square function. Several simplifi- 
cations for calculating matching errors have been sug- 
gested in the literature, some of which are summarized 
in section 5.2. 3g of the text Digital Pictures Represen- 
tation and Compression, by A.N. Netravali and B.J. 
Haskell (Plenum Press 1991). The contents of this pub- 
lication, and all other patents and publications referred 
to herein, are incorporated by reference into the present 
specification. The matching error computing unit 255 
may be, for example, an electrical circuit, having an ac- 
cumulator with a Manhattan adder, which computes a 
matching error according to the definition set forth above 
or one of the simplifications indicated above. 

The matching error computing unit 255 receives in- 
put data from the previous frame memory unit 205, the 
current frame memory unit 210, and the candidate mo- 
tion memory unit 220. In one embodiment, the matching 
error computing unit 255 also may receive a motion vec- 
tor directly from the motion refinement unit 260. In an 
alternative embodiment, however, the motion vector cal- 
culated by the motion refinement unit 260 may be re- 
trieved from the file 230 in the memory unit 220. Each 
matching error which is calculated for one of the motion 
vectors stored in the files 221 -230 is stored in one of sev- 
eral corresponding files 241-250 in the matching error 
memory unit 240. 

The motion estimation unit 200 also has a minimum 
detector unit 270 which determines the smallest value 
from among the matching errors stored in the files 
241 -250. The minimum detector unit 270 may be, for ex- 
ample, an electrical circuit comprising a comparator. An 
output of the minimum detector unit 270 is connected to 
a best motion vector selection unit 280. The best motion 
vector selection unit 280 determines which of the motion 
vectors stored in the memory unit 220 is the best motion 
vector for updating or refining the motion vector of the 
region or block which is being updated. The selection unit 
280, which may be implemented as a general purposes 
computer with appropriate software or as an electronic 
circuit, makes the above determination based upon a 
predetermined threshold value and a priority scheme as 
explained further below. 

The selection unit 280 also receives inputs from the 
matching error memory unit 240 and the candidate mo- 
tion vector memory unit 220. In one embodiment, the se- 
lection unit 280 also receives a calculated motion vector 
from the motion refinement unit 260. In an alternative 
embodiment, however, the motion vector calculated by 
the motion refinement unit 260 is retrieved from the file 
230 in the memory unit 220. The output of the selection 
unit 280 is a motion vector which is stored as the updated 



or refined motion vector in the motion field memory unit 

215. 

A control unit 290, which may be external to the mo- 
tion estimation and segmentation unit 200, is connected 

s to each of the other components in the unit 200. The con- 
trol unit 290 may be, for example, a central processing 
unit (CPU) or a processing element which controls the 
other units in the unit 200 and their interaction. 

FIGS. 3A - 3C are a flow chart showing the steps of 

10 motion estimation, segmentation and coding according 
to the present invention. The method of motion estima- 
tion and segmentation according to the present invention 
involves a hierarchical approach in which a motion vector 
updating routine is performed with respect to multiple 

15 levels of smaller and smaller regions of a frame. The 
method of processing a frame begins in step 300 at 
which point it is assumed that a preceding frame exists 
with reference to which a current frame can be predicted. 
In step 305, the current frame is segmented or divided 

20 into smaller regions of a predetermined shape. This seg- 
mentation represents a first segmentation level. While 
the predetermined shape may be arbitrary, triangular, or 
rectangular, the presently preferred predetermined 
shape is rectangular. Where the predetermined shape is 

25 rectangular, the frame may be divided into multiple re- 
gions of predetermined equal size. The predetermined 
size of the smaller regions depends upon several factors. 
The larger the size, the more likely it will be that a par- 
ticular region contains several objects moving in different 

30 directions. The choice of the size of the regions in the 
step 305 also depends upon the size of the smallest mov- 
ing object in the scene for which motion is detected. Re- 
gions of size sixteen by sixteen or thirty-two by thirty-two 
pixels appear to be suitable for images in Quarter Com- 

35 mon Intermediate Format (QCIF). Larger sizes such as 
sixty-four by sixty-four pixels, however, may also be ap- 
propriate in certain applications. In step 310, each of the 
smaller regions that results from step 305 is assigned an 
initial motion vector. 

40 in the preferred embodiment, the first time that step 
310 is performed, the initial motion vector assigned to 
each of the smaller regions is the motion vector with val- 
ue zero. An exemplary first level segmentation of a frame 
400 is shown in FIG. 4. The frame 400 is divided into 

45 multiple regions, each of which is rectangular and has 
an initial motion vector of zero. The choice of an initial 
motion vector with value zero is made to lessen the like- 
lihood that noise will be introduced. 

The motion vector of each smaller region is updated 

50 according to a motion vector updating routine explained 
below. The next step depends upon whether all the re- 
gions in the current segmentation level have been proc- 
essed according to the motion vector updating routine 
as indicated in step 320. If all the regions have not been 

55 processed, then, as indicated in step 325, the remaining 
regions are processed serially. The regions in the first 
segmentation level need not be processed in any partic- 
ular sequence. The presently preferred sequence, how- 
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ever, begins with a corner region, such as the region cor- 
responding to the upper-left of the current frame. The se- 
quence then proceeds across the upper row to the region 
corresponding to the upper-right of the current frame. 
Each subsequent row of regions is processed from left 
to right until the region corresponding to the lower-right 
of the current frame has been processed. 

The first step in the motion vector updating routine 
is shown in step 330 in which a matching error is com- 
puted for the current region under consideration by using 
the initial motion vector assigned to that region. The ini- 
tial motion vector assigned to the current region may be 
temporarily stored in the file 221 of the memory unit 220 
by retrieving it from the motion field memory unit 21 5. In 
step 332, the matching error obtained in step 330 is 
stored in one of the files 241 -250 in the memory unit 240. 
Next, as shown in step 335, a motion estimation tech- 
nique is performed with respect to the region under con- 
sideration. The motion estimation technique may be any 
known technique in the prior art or any similar technique 
which attempts to improve the motion vector initially as- 
signed to the region whose motion vector is being updat- 
ed by minimizing the prediction or matching error. Step 
335 may be performed by the motion refinement unit 
260. 

One type of suitable motion estimation technique for 
use in step 335 involves block matching methods, which 
are well known in the art. Several block matching meth- 
ods are described in section 5.2.3g of the text by A.N. 
Netravali and B.J. Haskell, mentioned above. These 
methods include a full search, a 2D-logarithmic search, 
a three step search, and a modified conjugate direction 
search, any of which is suitable for performing the block 
matching motion estimation in the step 335. The block 
matching motion estimation technique is performed for 
the region under consideration and finds a best matched 
region in the preceding frame within a predetermined 
search range. For very low bit-rate applications, search 
ranges of between ±7 and +15 pixels are suitable. The 
block matching technique performed in step 335 results 
in a matched motion vector which serves as one of the 
candidates for refining or updating the motion vector of 
the particular region under consideration. The matched 
motion vector is a two dimensional vector representing 
the distance between the location of the current region 
and the location of the best matched region in the pre- 
ceding frame. 

Next, as shown in step 340, the motion vector ob- 
tained from step 335 is stored in the file 230 of the mem- 
ory unit 220. Also, as shown in step 342, a matching error 
is computed by assigning to the current region the motion 
vector obtained in step 335 and stored in the file 230. In 
step 344, the matching error computed in the step 342 
is stored in the memory unit 240. 

The next steps may be better understood with refer- 
ence to FIG. 5 which shows a frame 500 which has al- 
ready been divided into smaller regions. In FIG. 5, the 
presence of a diamond in a region indicates that the mo- 



tion vector updating routine has already been performed 
with respect to that region. The region containing the V 
is the region currently under consideration and whose 
motion vector is being updated. Finally, the regions con- 

s taining a small circle are the regions neighboring the re- 
gion under consideration. The regions containing a small 
circle may conveniently be referred to as neighboring re- 
gions. Most of the regions have eight neighboring re- 
gions. It will be noted, however, that the regions along 

10 the sides of the frame have only five neighboring regions, 
and the corner regions have only three neighboring re- 
gions. 

In step 345, matching errors are computed by as- 
signing to the current region the motion vectors of each 
15 of the neighboring regions. The motion vectors of the 
neighboring regions may be temporarily stored in the 
files 222-229 of the memory unit 220 by retrieving them 
from the motion field memory unit 215. With respect to 
the region under consideration in FIG. 5, it will be noted 
20 that the motion vectors of some neighboring regions 
have already been updated, whereas the motion vectors 
of other neighboring regions have not yet been updated. 
In any event, the current motion vector for each neigh- 
boring region is used in the step 345. Next, in step 347, 
25 the matching errors computed in the step 345 are stored 
in the memory unit 240. 

It may be noted that up to ten matching errors have 
been computed with respect to the region under consid- 
eration. These matching errors include the matching er- 
30 ror obtained by using the initial motion vector assigned 
to the current region, the matching error computed by 
assigning to the current region the matched motion vec- 
tor obtained from the block matching technique, and up 
to eight matching errors obtained by assigning to the cur- 
35 rent-region the motion vectors of the neighboring re- 
gions. In step 350, the minimum detector circuit 270, for 
example, determines the smallest matching error among 
the above enumerated matching errors. In step 355, a 
best motion vector for the current region is selected from 
40 among the motion vectors currently stored in the candi- 
date motion vector memory unit 220 according to a pre- 
determined threshold value and according to a priority 
scheme, as further explained below with reference to 
FIG. 6. The motion vectors currently stored in the mem- 
45 ory unit 220 include the initial motion vector assigned to 
the current region, the matched motion vector obtained 
in the step 335, and up to eight motion vectors of the 
neighboring regions. Finally, in step 357, the best motion 
vector selected in the step 355 is assigned to the current 
50 region and stored in the motion field memory unit 215. 
The step 357 is the last step in the motion vector updating 
routine with respect to the current region. The process 
of the present method continues with step 320. 

55 Selection of the Best Motion Vector 

FIG. 6 shows the process by which the best motion 
vector is selected in step 355. The steps shown in FIG. 
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6 are performed, for example, by the best motion vector 
selection unit 280 in FIG. 2. A predetermined threshold 
value, which is chosen as a function of the size of the 
current region and a factor reflecting the amount of noise 
in the frame, is used to decide how the motion vector of 
the current region should be changed or updated. The 
basic idea is to substitute the matched motion vector or 
one of the motion vectors of the neighboring regions for 
the initial motion vector only if such a substitution yields 
a significant improvement in the matching error. The sig- 
nificance of an improvement is measured with respect to 
the predetermined threshold value, which contributes to 
the smoothness of the motion field and which reflects the 
amount of improvement obtained by assigning a partic- 
ular motion vector as the best motion vector relative to 
a smallest matching error. 

The process shown in FIG. 6 also indicates a pref- 
erence for reassigning to the current region the initial mo- 
tion vector assigned to the current region. This prefer- 
ence can be better understood by considering an image 
with a completely flat background and no motion. Any 
region in the flat area from the previous frame could be 
used to predict the current region during the block match- 
ing technique of the step 335. In real situations, however, 
there is always some noise added to the flat area. The 
noise causes random values to be assigned as the 
matched motion vector of each region. If no precaution 
is taken, then the flat areas will become noisy. The use 
of the threshold value combined with the initial assign- 
ment of motion vectors with value zero as well as the 
preference for the initial motion vector help prevent the 
introduction of noise. In addition, the process shown in 
FIG. 6 indicates a preference for the motion vectors of 
the neighboring regions over the matched motion vector 
in order to account for spatial consistency in the image. 

Once the smallest matching error (MIN) is deter- 
mined in step 350, selection of the best motion vector, 
as shown generally in step 355, starts in step 600 of FIG. 
6. In step 605, it is determined whether the absolute val- 
ue of the difference between the matching error (PE) ob- 
tained from the initial motion vector (PV) and the smallest 
matching error (MIN) is less than the predetermined 
threshold value (THR). If the absolute value in the step 
605 is less than the threshold value, this determination 
indicates that substituting one of the neighboring motion 
vectors or the matched motion vector for the current val- 
ue of the motion vector would not result in a significant 
improvement in the matching error. As shown in step 
610, the initial motion vector (PV) is, therefore, selected 
as the best motion vector and serves as the output of the 
best motion vector selection unit 280. The routine then 
would proceed to step 357. 

If, however, the absolute value in step 605 is not less 
than the threshold value, then the process continues with 
step 61 5. In step 615 and in each of steps 625, 635, 645, 
655, 665, 675 and 685, it is determined whether the ab- 
solute value of the difference between one of the match- 
ing errors, obtained by assigning to the current region 



the motion vector of one of the neighboring regions, and 
the smallest matching error (MIN) is less than the prede- 
termined threshold value (THR). In steps 61 5, 625, 635, 
645, 655, 665, 675 and 685, the terms EO, E1 , E2, E3, 

s E4, E5, E6 and E7 refer respectively to a different one 
of the matching errors obtained by assigning to the cur- 
rent region a different one of the motion vectors of the 
neighboring regions. Also, in FIG. 6, the motion vectors 
correspondingto the matching errors EO, El, E2, E3, E4, 

10 E5, E6 and E7 are, respectively, VO, V1 , V2, V3, V4, V5, 
V6 and V7. 

Although the process may proceed in any sequence 
of the neighboring regions and their corresponding mo- 
tion vectors and matching errors, the preferred sequence 
15 begins with the neighboring region which is to the up- 
per-left of the current region and proceeds in the same 
order in which the motion vector updating routine is per- 
formed as indicated above. Thus VO would be the motion 
vector of the upper-left neighboring region, and EO would 
20 be the matching error obtained by assigning VO to the 
current region. Similarly, V7 would be the motion vector 
of the lower-right neighboring region, and E7 would be 
the matching error obtained by assigning V7 to the cur- 
rent region. If there are fewer than eight neighboring re- 
25 gions, then only the steps corresponding to the existing 
neighboring regions are performed. Alternatively, the 
matching error (PE) obtained from the initial motion vec- 
tor (PV) may be used for the matching errors of the ad- 
ditional steps. 

30 In each of the steps 615, 625, 635, 645, 655, 665. 
675 and 685, if the absolute value of the difference be- 
tween the matching error and the smallest matching er- 
ror (MIN) is less than the predetermined threshold value 
(THR), then the motion vector corresponding to that 
55 matching error is selected as the best motion vector as 
shown in steps 610, 620, 630, 640, 650, 660, 670, 680 
and 690, respectively. On the other hand, in each of the 
steps 615, 625, 635, 645, 655, 665 and 675, if the abso- 
lute value of the difference between the matching error 
40 and the smallest matching error (MIN) is not less than 
the predetermined threshold value (THR), then the proc- 
ess evaluates the next matching error in the sequence. 
Once a motion vector is selected as the best motion vec- 
tor, the method continues with step 357. 
45 If the process continues until the step 685 and the 
absolute value of the difference between the matching 
error E7 and the smallest matching error (Ml N) is not less 
than the predetermined threshold value (THR), then this 
determination indicates that the matching error, obtained 
50 by assigning to the current region the matched motion 
vector (N V), is the smallest matching error (MIN) and that 
using the matched motion vector (NV) results in a signif- 
icant improvement in the matching error. As shown in 
step 695, the matched motion vector (NV) is selected as 
55 the best motion vector and serves as the output of the 
best motion vector selection unit 280. The method then 
continues with step 357. 
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Processing Subsequent Segmentation Levels 

Once all the regions in the first segmentation level 
are processed according to the motion vector updating 
routine, the method of the present invention proceeds, 
as shown in step 380, by evaluating whether a stop con- 
dition is reached or fulfilled. The stop condition may be, 
for example, a predetermined number of segmentation 
levels with respect to which the motion vector updating 
routine has been performed. Similarly, the stop condition 
may be a predetermined value of the total matching or 
prediction error for the current frame. Alternatively, the 
stop condition may be a lower limit upon the size of the 
regions with respect to which the motion vector updating 
routine is performed. In the preferred embodiment, the 
stop condition is reached when the size of the smallest 
regions is four by four pixels. Regions of four by four pix- 
els appear to represent a good balance between the 
competing goals of minimizing the matching errors and 
sending as little side information as possible to a receiver 
for decoding. In any event, an absolute lower limit of two 
by two pixels Is placed upon the size of the regions with 
respect to which the updating routine is performed. This 
lower limit is equivalent to the restriction that the regions 
that result from successive performances of the updating 
routine are at least as large as two by two pixels. The 
reason for choosing this absolute lower limit will become 
apparent with respect to the preferred method for repre- 
senting the shape of merged regions, as further ex- 
plained below. 

If the stop condition has not been fulfilled, then the 
method proceeds, as shown instep 385, by dividing each 
of the regions from the previous segmentation level into 
even smaller regions of a predetermined shape and size. 
In general, larger regions, which are divided into the 
smaller regions, may be referred to as parent regions As 
in step 305, each parent region may be divided into any 
predetermined shape. The predetermined shape of the 
regions in step 385 may be the same or different from 
the predetermined shape of the regions obtained by per- 
forming step 305. The preferred predetermined shape 
for use in step 385, however, also is rectangular. Each 
parent region, therefore, is divided into four smaller rec- 
tangular regions of equal size. FIG. 7 shows a second 
segmentation level of a frame 700 where the solid lines 
delineate the parent regions by performing step 305, and 
where the dotted lines delineate the smaller regions ob- 
tained by performing step 385. 

As shown in the step 310, each of the smaller re- 
gions that was divided from a parent region is assigned 
an initial motion vector. The initial motion vector assigned 
to each smaller region in the second segmentation level 
and in subsequent segmentation levels is the motion 
vector of the parent region from which it was divided in 
step 385. This assignment of initial motion vectors con- 
trasts with the assignment of initial motion vectors of val- 
ue zero to each region in the first segmentation level 
when step 310 is performed for the first time. 



Once initial motion vectors have been assigned to 
each region in the second segmentation level the meth- 
od of the present invention proceeds to perform the mo- 
tion vector updating routine by performing the steps 320 
s through 357 with respect to each of the smaller regions 
in the second segmentation level. The result is that the 
motion vector for each of the regions in the second seg- 
mentation level is refined or updated. If the stop condition 
is still not reached when all the regions in the second 
10 segmentation level have been processed, then a third 
segmentation level is created by dividing each region in 
the second segmentation level into yet smaller regions 
as indicated in step 385. Each region in the third seg- 
mentation level is assigned an initial motion vector equal 
15 to the motion vector of its parent region from the second 
segmentation level, and the motion vector updating rou- 
tine is again performed with respect to each of the re- 
gions in the third segmentation level. This process of 
segmenting regions, assigning to each resulting smaller 
20 region an initial motion vector equal to the motion vector 
of its parent region from the preceding segmentation lev- 
el, and performing the motion vector updating routine 
with respect to each region in the most recent segmen- 
tation level continues until the stop condition is fulfilled 
25 as shown in the step 380. It should be clear that there 
may, therefore, be multiple segmentation levels in which 
the motion vectors are continually refined and updated. 

The presently preferred sequence for performing the 
motion vector updating routine with respect to rectangu- 
30 lar regions within the second and subsequent segmen- 
tation levels is similar to the presently preferred se- 
quence of processing regions in the first segmentation 
level, as shown in FIG. 7. In FIG. 7, the presence of a 
diamond in a region indicates that the region has already 
55 been processed. The 'x' marks the next region whose 
motion vector is to be refined or updated. The presence 
of a circle in a region indicates the neighboring regions 
of the region whose motion vector is to updated next. It 
should be understood, however that regions in the sec- 
40 ond and subsequent segmentation levels may be proc- 
essed in any order according to the method of the 
present invention. 

One feature of the motion estimation and segmen- 
tation technique described in detail above is that it seeks 
45 simultaneously to reduce the matching or prediction er- 
ror and to increase the smoothness or spatial consisten- 
cy of the motion field. Another feature of the motion es- 
timation technique described above is that it seeks to re- 
duce the amount of noise introduced into the motion field 
50 and is more robust to noisy scenes. Further features of 
the present invention are described in greater detail be- 
low with respect to the motion field coding. 

Merging and Contour Coding of Regions 

55 

Once the motion vector updating routine has been 
performed with respect to all the regions in the current 
segmentation level and the stop condition is fulfilled, the 
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motion vector for each pixel in the current franne is proc- 
essed for transmission to a receiver (not shown). It is 
preferable, however, to minimize the amount of informa- 
tion that must actually be sent to the receiver. Several 
techniques are known for coding the motion field of an 
image in a video frame. A preferred method of encoding 
the motion field, which is particularly advantageous in 
connection with the motion estimation process of the cur- 
rent invention, is shown in FIG. 3C. 

As shown in step 390, a merging process is per- 
formed which merges adjacent regions having similar 
motion vectors to form a merged region. In the preferred 
embodiment; adjacent rectangular regions, which share 
a side and which have the same motion vector, are 
merged to form a merged region. FIG. 8 shows an ex- 
emplary frame 800 having merged regions A, B, C and 
O. In FIG. 8, the dotted lines delineate individual pixels, 
and the solid lines inside the frame 800 define the 
merged regions A, B, C and O. 

The shape and location of each merged region with- 
in the current frame may be represented, for example, 
by its contour. The contours of the merged regions must 
be labeled or represented in a manner that provides suf- 
ficient information so that the shape and location of the 
merged regions can uniquely be reconstructed at the re- 
ceiver. In the preferred embodiment, the contours are 
represented in a manner which minimizes the amount of 
information that must be sent to the receiver. 

As shown in step 391 , each pixel in the current frame 
is assigned a region label indicating to which merged re- 
gion it belongs. Next, a region shape representation rou- 
tine is performed for each of the pixels in the frame. The 
pixels are scanned line by line from left to right, starting 
with the pixel at the upper- left of the frame until all the 
pixels in the current frame are processed as shown in 
step 392. If the region shape representation routine has 
not been performed with respect to all the pixels, then 
the process continues with the next pixel, as shown in 
step 393. 

As shown in step 394, the region label of the current 
pixel under consideration is compared to the region label 
of each pixel in a specified group of three neighboring 
pixels. The specified group of three neighboring pixels 
includes an adjacent pixel to the left of the pixel under 
consideration, an adjacent pixel above the pixel under 
consideration, and a pixel to the immediate upper-left of 
the pixel under consideration. It will be noted, however, 
that the upper-left pixel of the frame has no neighboring 
pixels as defined above. The upper-left pixel is, there- 
fore, never labelled as a contour pixel in this routine. Pix- 
els along the top and left sides of the frame have only 
one neighboring pixel as defined above. All other pixels 
have three neighboring pixels. If the region label of the 
current pixel is the same as each of the neighboring pix- 
els in the specified group, then the process continues 
with step 392. As shown in step 396, however, each pixel 
which has a region label that differs from the region label 
of at least one of the specified group of three neighboring 



pixels is labelled as a contour pixel. The process then 
continues with step 392. The region shape representa- 
tion routine, therefore, includes the steps 392-394 and 
396. 

s FIG. 9 shows a frame 900 that is obtained by per- 
forming the region shape representation routine for each 
pixel in FIG. 8. The pixels in FIG. 9 that are marked with 
an 'x' represent the contour pixels. At a decoder at the 
receiver, the inverse of the region shape representation 
10 routine may be performed to recover regions labels cor- 
responding to each pixel inside the merged regions. The 
region label assigned to the contour pixels at the decoder 
is the same as the label assigned to the right, lower-right 
or lower neighboring non-contour pixels. The region ja- 
rs bel corresponding to the other pixels may be recovered 
by any known filling algorithm. Such filling algorithms are 
described more fully, for example, in Computer Graph- 
ics: Principles and Practice, edited by J. Foley, A. Van 
Dam, S. Feiner, and J. Hughes (Addison-Wesley Pub- 
20 lishing Co. 1987) and in D.H. Ballard and CM. Brown, 
Computer Vision (Prentice Hall, Inc. 1982). This recov- 
ery technique assumes that any block of two by two pix- 
els contains at most three contour pixels which is equiv- 
alent to the assumption that the smallest segmented re- 
25 gion that is obtained by performing step 385 is two by 
two pixels. 

The contour pixels that uniquely define the merged 
regions then may be encoded as shown in step 398. The 
most popular technique for coding contour pixels is 
30 known as chain coding, which consists of coding the po- 
sition of a pixel relative to the position of its neighboring 
pixels. Chain coding is more fully described in H. Free- 
man, "On the Encoding of Arbitrary Geometric Configu- 
rations," IRE Trans, on Elec. Camp., EC-10, pp. 
35 260-268 (1961) (hereinafter "Freeman chain coding"), 
which is incorporated by reference herein. This chain 
coding technique as well as known variations of this tech- 
nique are suitable for use in the step 398. Once the step 
398 of contour coding is performed, motion estimation 
40 and encoding of the motion field for the current frame is 
complete, as shown in step 399. 

A preferred technique of contour coding, which is 
particularly advantageous in conjunction with the motion 
estimation technique of the present invention, is shown 
45 in FIG. 10. The contour coding routine of FIG. 10, which 
starts in step 1000, is a variation of the Freeman chain 
coding technique. Each pixel is scanned in sequence, 
for example, from upper-left to lower-right until all the 
contour pixels have been encoded, as shown in step 
50 1 01 0. If all the contour pixels have not yet been encoded, 
then the process continues with the next remaining con- 
tour pixel as shown in step 1020. 

As in other chain coding techniques, the basic idea 
of the contour coding routine shown in FIG. 10 is to con- 
55 nect straight segments of a contour, by specifying how 
far the contour continues in the specified direction before 
changing direction and by specifying the direction in 
which the contour continues or whether an endpoint has 
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been reached. 

The following steps may be better understood in 
conjunction with FIGS. 11A - 11C which show an exem- 
plary frame during various stages of the contour coding 
routine. In FIGS. 11 A -11C, the solid lines delineate in- s 
dividual pixels, and the numbered pixels represent the 
contour pixels. As shown in step 1030, the coordinates 
(x, y) of an initial contour pixel, such as the contour pixel 
1 , are stored in memory. In step 1035, the initial contour 
pixel is removed from consideration in subsequent steps io 
of the contour coding routine if it has fewer than two re- 
maining adjacent contour pixels. Next^ in step 1040, the 
direction in which a segment of adjacent contour pixels 
continues is stored in memory. When a decision must be 
made regarding the direction in which coding of a partic- 15 
ular segment should continue, the following preferred or- 
der may be used: right, down, left or up. A different order, 
however, may be used consistently with the present in- 
vention. Once a direction is specified, the contour con- 
tinues in that direction so long as the next pixel is a con- 20 
tour pixel which has not been coded and removed, as 
explained below. As shown in step 1045, the length of 
the initial segment is then stored by specifying the 
number of pixels for which the contour continues in the 
specified direction. Laemmel coding conveniently may 25 
be used to code the variable length of the segments. Any 
other suitable run-length code, however, also may be 
used to code the variable length of the segments. 

Next, as shown in step 1050, each contour pixel that 
has been encoded and that has fewer than three remain- 30 
ing adjacent contour pixels is removed from considera- 
tion in subsequent steps of the contour coding routine. 
In other words, those contour pixels which represent pix- 
els where two contours intersect are not yet removed 
from consideration in the next loop of the contour coding 35 
routine, even though they already have been encoded 
once. 

A particular contour ends either when it returns to its 
starting point or when there are no more adjacent con- 
tour pixels to be encoded. If a particular contour has not 40 
ended, as shown in step 1060, then the next direction in 
which the contour continues is stored in memory, as 
shown in step 1 070. The direction specified in step 1070 
is limited to two choices with respect to the previous di- 
rection. The processing continues with step 1045. The 45 
steps 1050-1070 are repeated until the contour ends. 
The fact that a pixel represents the last pixel of a partic- 
ular contour may be encoded by storing additional infor- 
mation as shown in step 1 080. For example, the last pixel 
may be specified by storing the immediately previous di- 50 
rection as the next direction code. Preferably, however, 
the last pixel is specified by storing a segment length of 
zero. 

With referenceto FIG. 11 A, the first contour encoded 
by the contour coding routine shown in FIG. 1 0 includes 55 
the pixels 1-44 This first contour is encoded by storing 
the following information: the absolute coordinates of 
pixel 1, the direction "right," and a length of two; the di- 



rection "down" and a length of two; the direction "right" 
and a length of ten; the direction "down" and a length of 
eight; the direction "left" and a length of twelve; the di- 
rection "up" and a length of ten. The last pixel may be 
indicated by storing the direction "up" as the next direc- 
tion or by storing a segment length of zero. FIG. 11 B 
shows the contour pixels which remain and still must be 
encoded after performing a first loop of the contour cod- 
ing routine with respect to the pixels in FIG. 11 A. Note 
that the initial contour pixel 1 is not removed from con- 
sideration until after it is encoded a second time by stor- 
ing the direction "up" and the length of ten. 

The reason for removing only those contour pixels 
that have fewer than three remaining adjacent contour 
pixels becomes apparent, as explained more fully below, 
when a fourth loop of the contour coding routine is per- 
formed with respect to the pixels shown in FIG. 11 A. A 
second performance of the loop encodes the contour pix- 
els 7, 45-51 , and 31 and removes the pixels 7, 45-47, 
49-51, and 31 from consideration. A third performance 
of the loop encodes the contour pixels 1 1 , 52-60, and 25, 
and removes the pixels 11, 52-54, 56-60, and 25 from 
consideration. 

FIG. 11c shows the contour pixels which remain and 
still must be encoded after performing three loops of the 
contour coding routine with respect to the pixels in FIG. 
11 A. The remaining contour pixels 39, 61-63, 48, 64-66, 
55, 67-69 and 1 9 may be encoded by performing one 
final loop of the contour coding routine. In contrast, if all 
the contour pixels that were encoded in any given loop 
were removed from consideration in subsequent loops 
of the contour coding routine, then at least six loops of 
the Freeman chain coding technique would be required 
to encode all the contour pixels in FIG. 1 1 A. The amount 
of information that must be sent to the receiver increases 
as the number of loops increases because the coordi- 
nates of the initial contour pixel in each loop are stored 
for transmission as indicated in step 1030. Storing and 
transmitting the coordinates, however, is costly in terms 
of the amount of information that must be sent. The con- 
tour coding described above, therefore, improves the ef- 
ficiency of the coding process. 

A further improvement may be made to the coding 
process by normalizing the coordinates of the initial con- 
tour pixels as well as the length of the segments of the 
contours based upon the size of smallest regions that 
resulted from performance of the motion estimation tech- 
nique described above. For example, if the smallest re- 
gions have a size of two by two pixels, then dividing by 
two the values of the coordinates and the segment 
lengths prior to encoding will further improve the efficien- 
cy of the coding process. Once all the contour pixels 
have been encoded, the contour coding routine ends, as 
shown in step 1090. 

Finally, the encoded information, including the con- 
tour of each merged region and the motion vector asso- 
ciated with each merged region is sent to a decoder in 
the video receiver where it is used together with the pre- 
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vious frame information to predict the image in the cur- 
rentf rame. The motion vectors may be encoded in binary 
code prior to sending them to the decoder. 

It should be noted, that as with other motion estima- 
tion techniques, the motion estimation technique de- 
scribed above will not necessarily account for the ap- 
pearance of previously covered portions of an image or 
the disappearance of previously uncovered portions of 
an image. Nevertheless, a signal may be provided to the 
decoder alerting it to the fact that the motion estimation 
technique is considered a failure with respect to a par- 
ticular region or regions. For example, just prior to the 
merging step 390, the matching error of each region may 
be computed. If the computed matching error for any re- 
gion or regions exceeds a particular threshold value, 
then the region or regions are assigned a special label 
indicating that the motion estimation is considered a fail- 
ure for those regions. During the merging step 390, ad- 
jacent regions which are assigned the special label may 
be merged in the same manner as other regions are 
merged. The region shape representation routine and 
the contour coding routine then may be applied in the 
same manner as described above. 

FIG. 12 is a block diagram of a system 1200 incor- 
porating motion estimation, segmentation and coding 
according to the present invention. The system 1200 in- 
cludes the motion estimation unit 200 which has already 
been more fully described above. The motion estimation 
unit 200 is connected to a merge and label unit 1210. 
The merge and label unit 1210 receives the motion vec- 
tors assigned to regions of pixels and stored in the mo- 
tion field memory 215 of the motion estimation unit 200. 
The merge and label unit 1210 merges adjacent regions 
which have similar motion vectors to form merged re- 
gions. In a preferred embodiment, the unit 1210 merges 
adjacent regions which have the same motion vector. 
The merge and label unit 1 21 0 also assigns to each pixel 
in the current frame a region label indicating to which 
merged region it belongs. The merge and label unit 1210 
may be, for example, a dedicated logic circuit or a gen- 
eral purpose processor programmed with software to 
perform the merging and labelling functions. 

The merge and label unit 1210 is connected to two 
read-write memory units. A first region label memory unit 
1220 stores the motion vector associated with each 
merged region. A second region label memory unit 1230 
stores the region label assigned to each pixel in the cur- 
rent frame. The first region label memory unit 1220 may 
be connected to a motion vector coding unit 1260 which 
converts the motion vectors stored in the memory unit 
1220 into binary code for transmission to a receiver (not 
shown). The motion vector coding unit 1260 may be, for 
example, a special purpose circuit designed to convert 
the motion vectors into binary code or a general purpose 
processor programmed to perform the conversion. 

The second region label memory unit 1230 is con- 
nected to a region shape representation unit 1240 which 
performs the region shape representation routine de- 



scribed above. The region shape representation unit 
1240 forms a set of contour pixels that uniquely defines 
the merged regions. The unit 1240 may be implemented 
by a logic circuit or a general purpose processor pro- 
s grammed to perform the region shape representation 
routine. The region shape representation unit 1240 is 
connected to a contour coding unit 1 250 which encodes 
the contour pixels according to the contour coding rou- 
tine described above. The contour coding unit 1 250 may 
also be, for example, a dedicated circuit or a general pur- 
pose processor programmed to perform the contour cod- 
ing routine. 

The system 1 200 also has a control unit 1270 which 
is connected to each of the other components in the sys- 
tem 1200. The control unit 1270 may be, for example, a 
central processing unit (CPU) or a processing element 
(PE) which controls the other components in the system 
1200 and their interaction. Also, the control unit 1270 and 
the control unit 290 both may be embodied in the same 
CPU or the same processing unit. Finally, the contour 
coding unit 1 250 and the motion vector coding unit 1 260 
may be connected directly or indirectly to the receiver 
(not shown) so that the encoded contour pixels and the 
binary coded motion vectors may be transmitted to the 
receiver for decoding and prediction of the current frame. 

The system described above is particularly advan- 
tageous in very low bit-rate applications where reducing 
the amount of side information that must be sent to the 
decoder is an important factor. Applications that are suit- 
able for transmission of an encoded motion field at very 
low bit-rates include, for example, audio-visual mobile 
telecommunication systems, surveillance systems, and 
certain multimedia systems. 



Simulations were performed comparing the results 
obtained from the motion estimation, segmentation and 
coding method of the present invention to that of a full 
search block matching technique using the syntax of 
H.261, the international recommendation for video cod- 
ing. Frames from a sequence of Miss America in QCIF 
were used in the simulations, and the motion field be- 
tween every five original frames was calculated. A skip 
of fou r frames was chosen because it corresponds to the 
typical situation used in very low bit-rate applications. 

FIG. 13 shows graphically the quality of the motion 
field prediction. The horizontal axis indicates the frame 
number, and the vertical axis indicates the peak-to-peak 
signal to noise ratio (PSNR) in units of decibels (dB). The 
quality of the motion field prediction is shown for three 
situations. The quality of the motion field prediction ob- 
tained by using the H.261 recommendation and blocks 
of eight by eight pixels is shown by the solid line. The 
quality of the motion field prediction obtained fortwo cas- 
es using the motion estimation method of the present in- 
vention is shown by the dotted and dashed lines, respec- 
tively. In the latter two cases, a search range of ±7 pixels 
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was used for the block matching motion estimation tech- 
nique. In the case represented by the dotted lines, the 
smallest regions for which the motion vector updating 
routine was performed had a size of eight by eight pixels. 
I n the case represented by the dashed lines, the smallest 
regions for which the motion vector updating routine was 
performed had a size of four by four pixels. 

As may be seen from FIG. 13, the quality of predic- 
tion improves an average of 0.5 dB when the motion es- 
timation method of the present invention is used with re- 
gions having a size of four by four pixels compared to 
the quality obtained by using the full search technique. 

There may be situations where an improvement in 
quality over the full search technique is not required. As 
will become apparent, however, with respect to FIG. 14, 
the motion estimation method of the present invention 
provides other advantages over the full search technique 
even when no improvement in quality is required. FIG. 
1 4 shows graphically the number of bytes per frame that 
are required to transmit the motion field information. The 
horizontal axis represents the number of the frame, and 
the vertical axis represents the number of bytes required. 
Five cases are represented on the graph in FIG. 14. In 
the first case, the H.261 syntax was used to perform the 
full search motion compensation as well as to code the 
motion vectors of the resulting regions. In the second 
case, the motion compensation method of the present 
invention was used with the smallest regions having a 
size of eight by eight pixels. The motion vector informa- 
tion was coded for transmission, however, using the 
H.261 syntax. In the third case, the motion estimation 
method of the present invention again was used with the 
smallest regions having a size of eight by eight pixels. 
Also, the motion vector information was coded using the 
merging step, the region shape representation routine, 
and the contour coding routine. In the fourth case, the 
motion estimation method of the present invention was 
used with the smallest regions having a size of four by 
four pixels. The motion vector information was coded for 
transmission using the H.261 syntax. In the fifth case, 
the motion estimation method of the present invention 
again was used with the smallest regions having a size 
of four by four pixels. The motion vector information was 
coded, however, using the merging step, the region 
shape representation routine, and the contour coding 
routine. 

It can be seen from FIGS. 13 and 14that the number 
of bytes required to transmit the motion vector informa- 
tion can be reduced significantly by using the motion es- 
timation method of the present invention. For example, 
more than a 50% reduction is achieved in the second 
case compared to the amount of information that must 
be sent in the first case. This reduction comes at the ex- 
pense, however, of an average drop of 1 dB in the quality 
of prediction as is evident from FIG. 13. Nevertheless, 
the visual quality obtained by sequentially displaying the 
predicted frames obtained from the first and second cas- 
es is virtually indistinguishable. There is, therefore, a sig- 



nificant advantage to using the motion estimation meth- 
od of the present invention even where an improvement 
in the quality of the prediction is not required. 

Also, as indicated by the third and fifth cases in FIG. 

s 1 4, using the preferred method of coding the motion field 
information results in even greater reductions in the 
amount of information that must be sent to the receiver. 
The motion estimation method of the present invention, 
in conjunction with the preferred coding technique, re- 

10 suits in significant improvements over the full search 
block matching and motion field coding techniques cur- 
rently in use. 

Although specific embodiments of the present in- 
vention have been described in detail above, other ap- 
15 plications and arrangements within the spirit and scope 
of the present invention will be readily apparent to per- 
sons of ordinary skill in the art. The present invention is, 
therefore, limited only by the appended claims. 



1. An improved method of motion field estimation for 
use in motion compensated frame-to-frame predic- 
ts tion coding comprising the steps of: 

dividing a frame having a plurality of pixels into 
a plurality of smaller regions to form a first segmen- 
tation level; 

assigning to each of said plurality of smaller 

30 regions an initial motion vector; and 

performing for each of said plurality of smaller 
regions a motion vector updating routine which 
updates the motion vector of a smaller region by 
assigning to it a best motion vector selected from 

35 among the initial motion vector assigned to the 
smaller region, a matched motion vector obtained 
by performing a block matching technique for the 
smaller region, and motion vectors of the smaller 
region's neighboring regions, wherein the best 

40 motion vector is selected according to a priority 
scheme and a predetermined threshold value, and 
wherein the threshold value reflects the amount of 
improvement, obtained by assigning a particular 
motion vector as the best motion vector, relative to 

45 a smallest matching error. 

2. The method of claim 1 wherein the motion vector 
routine includes the steps of: 

determining the smallest matching error from 
50 among the matching errors obtained respectively by 
assigning to the smaller region the following motion 
vectors: 

(a) the initial motion vector assigned to the 
55 smaller region; 

(b) the matched motion vector obtained by per- 
forming a block matching technique for the 
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smaller region; and 

(c) the motion vectors of the smaller region's 
neighboring regions; 

5 

selecting the initial motion vector as the best 
motion vector if the absolute value of the difference 
between the smallest matching error and the match- 
ing error obtained by using the initial motion vector 
is less than the predetermined threshold value; io 

selecting the motion vector of one of the neigh- 
boring regions as the best motion vector if: 

(a) the absolute value of the difference between 

the smallest matching error and the matching 15 
error obtained by using the initial motion vector 
is not less than the predetermined threshold 
value; and 

(b) the absolute value of the difference between 20 

the smallest matching error and the matching 
error obtained by assign ing to the smaller region 
the motion vector of the neighboring region is 
less than the predetermined threshold value; 
and 25 

selecting the matched motion vector as the 
best motion vector if: 

(a) the absolute value of the difference between 30 
the smallest matching error and the matching 
error obtained by using the initial motion vector 

is not less than the predetermined threshold 
value; and 

35 

(b) the absolute value of the difference between 
the smallest matching error and each of the 
matching errors obtained by assigning to the 
smaller region the motion vector of one of the 
neighboring region is not less than the prede- 40 
termined threshold value. 

3. The method of claim 2 wherein the initial motion vec- 
tor assigned to each smaller region in the first seg- 
mentation level has the value zero. 45 

4. The method of claim 3 further including the steps of: 

(a) dividing each smaller region in the previous 
segmentation level into a plurality of smaller 
regions of predetermined shape and size to 
form a subsequent segmentation level; 

(b) assigning to each of the plurality of smaller 
regions in the subsequent segmentation level 55 
an initial motion vector equal to the motion vec- 
tor of its parent region; and 



(c) performing the motion vector updating rou- 
tine for each of said plurality of smaller regions 
in the subsequent segmentation level. 

5. The method of claim 4 further comprising the step 

of repeatedly performing the steps (a), (b) and (c) 
specified in claim 4 until a stop condition is reached. 

6. The method of claim 5 wherein the shape of each 
smaller region is rectangular. 

7. The method of claim 6 wherein the size of the 
smaller regions is the same for each smaller region 
in a particular segmentation level. 

8. The method of claim 7 wherein the stop condition is 
a lower bound on the size of the smaller regions. 

9. The method of claim 7 wherein the stop condition is 
a predetermined value of a total matching error for 
the frame. 

10. The method of claim 7 wherein the stop condition is 
a predetermined number of segmentation levels 
with respect to which the motion vector updating rou- 
tine is performed. 

11. The method of claim 7 further comprising the steps 

of: 

merging adjacent regions having similar 
motion vectors; 

assigning a region label to each of said plural- 
ity of pixels; 

performing a region shape representation rou- 
tine for each of said plurality of pixels to define a plu- 
rality of contour pixels; and 

performing a contour coding routine to encode 
each of said plurality of contour pixels. 

12. An apparatus for use in improved motion field esti- 
mation of a video image, said apparatus comprising: 

a previous decoded frame memory unit for 
storing a monochrome intensity for each pixel in a 
preceding frame; 

a current frame memory unit for storing a mon- 
ochrome intensity for each pixel in a current frame; 

a motion field memory unit for storing a motion 
vector for each pixel in the current frame; 

a motion refinement unit for performing a block 
matching technique for a block of pixels in the cur- 
rent frame and for providing a matched motion vec- 
tor; 

a candidate motion vector memory unit for 
storing motion vectors which are candidates for 
updating the motion vector of said block of pixels, 
wherein the candidate motion vector memory unit is 
connected to said motion field memory unit for 
receiving and storing an initial motion vector 
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assigned to said block ot pixels and motion vectors 
of regions neighboring said block of pixels, and 
wherein the candidate motion vector memory unit is 
further connected to said motion refinement unit for 
receiving and storing the matched motion vector; 

a matching error computing unit for computing 
matching errors obtained by assigning to said block 
of pixels the motion vectors stored in the candidate 
motion vector memory unit; 

a matching error memory unit for storing the 
matching errors computed by said matching error 
computing unit; 

a minimum detector unit for determining the 
value of the smallest matching error among the 
matching errors stored in the matching error mem- 
ory unit; 

a best motion vector selection unit, connected 
to the motion field memory unit, for determining, 
according to a priority scheme and a predetermined 
threshold value, a best motion vector for refining the 
motion vector of the block of pixels, wherein the best 
motion vector is selected from among the motion 
vectors stored in the candidate motion vector mem- 
ory unit; and 

a control unit for controlling the other units and 
their interaction, wherein the control unit is con- 
nected to each of the other units in the apparatus. 

13. The apparatus of claim 12 wherein the best motion 
vector selection unit performs the following func- 
tions: 

selecting the initial motion vector as the best 
motion vector if the absolute value of the difference 
between the smallest matching error and the match- 
ing error obtained by using the initial motion vector 
is less than the predetermined threshold value; 

selecting the motion vector of one of the neigh- 
boring regions as the best motion vector if: 

(a) the absolute value of the difference between 
the smallest matching error and the matching 
error obtained by using the initial motion vector 
is not less than the predetermined threshold 
value; and 



is not less than the predetermined threshold 

value; and 

(b) the absolute value of the difference between 
s the smallest matching error and each of the 

matching errors obtained by assigning to the 
smaller region the motion vector of one of the 
neighboring region is not less than the prede- 
termined threshold value. 

10 

14. A system for performing motion field estimation and 
coding of a video image, said system comprising: 

a motion estimation unit for refining a motion 
vector assigned to a block of pixels in a current frame 

15 by assigning to the block of pixels a best motion vec- 
tor selected from among a plurality of candidate 
motion vectors, wherein said best motion vector is 
selected according to a priority scheme and a pre- 
determined threshold value that reflects the amount 

20 of improvement, obtained by assigning a particular 
motion vector as the best motion vector, relative to 
a smallest matching error, and wherein said plurality 
of candidate motion vectors includes an initial 
motion vector assigned to said block of pixels, a 

25 matched motion vector obtained by performing a 
block matching technique for said block of pixels, 
and motion vectors of regions neighboring said 
block of pixels; 

a merge and label unit for merging adjacent 

30 regions of the current frame that have similar motion 
vectors to form merged regions and for assigning to 
each pixel in the current frame a region label; 

a first region label memory unit for storing a 
motion vector associated with each merged region; 

35 a second region label memory unit for storing 

the region label assigned to each pixel in the current 
frame; 

a region shape representation unit for forming 
a set of contour pixels that defines the merged 
40 regions; 

a contour coding unit for encoding the set of 
contour pixels; and 

a control unit for controlling the other units and 
their interaction, wherein the control unit is con- 
45 nected to each of the other units in the system. 

15. The system of claim 14 wherein the motion estima- 
tion unit comprises: 

a previous decoded frame memory unit for 
storing a monochrome intensity for each pixel in a 
preceding frame; a current frame memory unit 
for storing a monochrome intensity for each pixel in 
a current frame; 

a motion field memory unit for storing a motion 
vector for each pixel in the current frame; 

a motion refinement unit for performing a block 
matching technique for a block of pixels in the cur- 
rent frame and for providing a matched motion vec- 



(b) the absolute value of the difference between 
the smallest matching error and the matching 
error obtained by assigning to the smaller region 
the motion vector of the neighboring region is 
less than the predetermined threshold value; 50 
and 

selecting the matched motion vector as the 
best motion vector if: 

55 

(a) the absolute value of the difference between 
the smallest matching error and the matching 
error obtained by using the initial motion vector 
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tor; 

a candidate motion vector memory unit for 
storing motion vectors which are candidates for 
updating the motion vector of said block of pixels 
wherein the candidate motion vector memory unit is s 
connected to said motion field memory unit for 
receiving and storing an initial motion vector 
assigned to said block of pixels and motion vectors 
of regions neighboring said block of pixels, and 
wherein the candidate motion vector memory unit is io 
further connected to said motion refinement unit for 
receiving and storing the matched motion vector; 

a matching error computing unit for computing 
matching errors obtained by assigning to said block 
of pixels the motion vectors stored in the candidate 15 
motion vector memory unit; 

a matching error memory unit for storing the 
matching errors computed by said matching error 
computing unit; 

a minimum detector unit for determining the 20 
value of the smallest matching error among the 
matching errors stored in the matching error mem- 
ory unit; and 

a best motion vector selection unit, connected 
to the motion field memory unit, for determining, 25 
according to a priority scheme and a predetermined 
threshold value, a best motion vector for refining the 
motion vector of the block of pixels, and wherein the 
best motion vector is selected from among the 
motion vectors stored in the candidate motion vector 30 
memory unit. 

16. The system of claim 1 5 wherein the best motion vec- 
tor selection unit performs the following functions: 

selecting the initial motion vector as the best 35 
motion vector if the absolute value of the difference 
between the smallest matching error and the match- 
ing error obtained by using the initial motion vector 
is less than the predetermined threshold value; 

selecting the motion vector of one of the neigh- 40 
boring regions as the best motion vector if: 

(a) the absolute value of the difference between 
the smallest matching error and the matching 
error obtained by using the initial motion vector 45 
is not less than the predetermined threshold 
value; and 

(b) the absolute value of the difference between 

the smallest matching error and the matching 50 
error obtained by assigningto the smaller region 
the motion vector of the neighboring region is 
less than the predetermined threshold value; 
and 

55 

selecting the matched motion vector as the 
best motion vector if: 



(a) the absolute value of the difference between 
the smallest matching error and the matching 
error obtained by using the initial motion vector 
is not less than the predetermined threshold 
value; and 

(b) the absolute value of the difference between 
the smallest matching error and each of the 
matching errors obtained by assigning to the 
smaller region the motion vector of one of the 
neighboring region is not less than the prede- 
termined threshold value. 



14 



EP 0 691 789 A2 




15 



EP 0 691 789 A2 



O 
OC 

Z 
O 

o 



o 

CM 



CM 



1 u^oP 

2 CM CO 
Ui CM CM 

0 C\l CM 

1 SI SI 



o 

CM 
CM 



1- <o 
CM C4 
CM CM 



. 4 . 



m 



o 

CM 



Q- ^ 



lUO 

ii 

so 



in 
m 



o 

CO 
CM 



UJ 



UJ 
Q 

o 
o 

LU 

o 

CO 
3 

o 
> 

CC ' 
Q- 



>- 

CC 

O 

UJ 
UJ 

< 



O 



CC 



in 



0>J u J 

_ CM CM 



UJ 



DC CM CM 
O 

CC 

CC ^ 
X 

o 



^ CM 
^ CM 



JCM 



O 



UJ UJ 

"1 



CM 



Z 



CM 



CM 

d 



Ui=J 

cn UJ 

CO 



16 



EP 0 691 789 A2 



300 

/^"START CUREnT^ 



305 



DIVIDE CURRENT FRAME 
INTO SMALLER REGIONS 
OF PREDETERMINED SHAPE 



310 



ASSIGN TO EACH 
REGION AN INITIAL 
MOTION VECTOR 



320 



325 




380 



YES. 



330 



GOTO 
NEXT REGION 

i 




SPLIT EACH 
REGION INTO 
SMALLER 
REGIONS 



COMPUTE MATCHING 
ERROR OBTAINED 

WITH INITIAL 
MOTION VECTOR 



332 



STORE MATCHING 
ERROR OBTAINED 
IN STEP 330 



335 



PERFORM A BLOCK 
MATCHING MOTION 
ESTIMATION USING 
A PREDETERMINED 
SEARCH RANGE 



©FROM FIG. 38 



I 



FIG. 3A 



TO FIG. 3B 



17 



EP 0 691 789 A2 



©TO FIG. 3A 



357 



340 



1 



FROM FIG. 3A 



STORE MOTION 
VECTOR OBTAINED 
IN STEP 335 



342 



COMPUTE MATCHING 
ERROR OBTAINED BY 
ASSIGNING TO THE 
CURRENT REGION THE 
MOTION VECTOR 
STORED IN STEP 340 
I ZZ 



344 



345 



STORE MATCHING 
ERROR OBTAINED 
IN STEP 342 

i 



COMPUTE MATCHING 
ERRORS OBTAINED BY 

ASSIGNING TO THE 
CURRENT REGION THE 
MOTION VECTORS OF THE 
NEIGHBORING REGIONS 



347 



350 



STORE MATCHING 
ERRORS COMPUTED 
IN STEP 345 

1 



DETERMINE THE 
SMALLEST MATCHING 
ERROR AMONG THE 
MATCHING ERRORS 
STORED IN THE STEPS 
332, 344 AND 347 



355 



ASSIGN TO THE 
CURRENT REGION 
THE BEST MOTION 
VECTOR SELECTED 
IN STEP 355 



SELECT THE BEST MOTION 
VECTOR FROM AMONG THE 
INITIAL MOTION VECTOR, THE 
MOTION VECTOR STORED IN 
STEP 340. AND THE MOTION 
VECTORS OF THE NEIGHBORING 
REGIONS BASED UPON A 
PREDETERMINED THRESHOLD 
VALUE AND A PRIORITY SCHEME 



FIG. 3B 



18 



EP 0 691 789 A2 



390 



1 



FROM FIG. 3A 



MERGE ADJACENT REGIONS 
HAVING THE SAME 
MOTION VECTOR 

i 



391 



ASSIGN REGION 
LABEL TO 
EACH PIXEL 



PERFORM 
CONTOUR CODING 




FIG. 3C 



LABEL THE CURRENT 
PIXEL AS A 
CONTOUR PIXEL 



19 



EP 0 691 789 A2 



400 



FIG. 4 



FIG. 5 



500 



FIG. 7 



700 


o 


o 


o ; 


o 


o ; 


o 


o 


o 


o 


o 


o 


o 


o ' 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


0 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o° 


<? 


o° 


o 


o 


o 


o 


>o 


o 


o 


o° 


X 


o 
















o 


r 0 


o 





























20 



EP 0 691 789 A2 





21 



EP 0 691 789 A2 



800 



T 

1 -I. 



FIG. 8 



.J. 



-t- H 

I 



I 

I 



I 



- h -I- -* 

I 

I 

I 
I 
i 



- H -I- 



L -L J 



J t I L 



'4- 



, 1. . 



. ^ . L . 
I 



I 

T ■ 
.L. 



I 

r T 



4- - K - 

• I- • 

• r " 

r ■ 

J - L - 



FIG. 9 



900 



I i I ' I I I I I 1 1 1 I I I 



J- JAl I « lAi , 




V'V V'V Y' * * V V'V V* 

XIII v« » > Y< V' Y' 



Xi I • » 

or -r -I- T 



vi I I Y' » ' Y* Y' 



I I I I VI I 
-I- -I- i I Irii^ Ji 



2^ 



Aj. j . l . 

Xi.i. 



•X'X'X'X'X'X'XjXiX'XiXi 



X;-: 



I I I 

T -|- 
I I I 

I I I 
-I I I— 



I t I I 1 I I 
"r-r-i-7-r-r 
I I I I I I I 
r-n-T-r-r-i- 

.JL-U J- J-L-'_ J- 
I t I I I I I 

-I t I I I I u 



22 



EP 0 691 789 A2 



1000 



START CONTOUR 
CODING 



IJR^ 



1080 




1090 

1 



YES, 



y^END CONTOUR 
V^_CODING ^ 



1030 



STORE INFO INDICATING 
END OF CONTOUR 



GO TO NEXT REMAINING 
CONTOUR PIXEL 

i 



STORE COORDINATES OF 
INITIAL CONTOUR PIXEL 



1035 



REMOVE INITIAL CONTOUR 

IF IT HAS FEWER THAN 
TWO REMAINING ADJACENT 
CONTOUR PIXELS 



1040 



1045 



STORE DIRECTION IN 
WHICH TO CONTINUE 



1050 



STORE LENGTH 
OF SE GMENT 

1 



REMOVE CONTOUR PIXELS 
vJhAVING FEWER THAN THREE 
REMAINING ADJACENT 
CONTOUR PIXELS 



1060 



YES 



FIG. 10 



1070 




STORE NEXT 
DIRECTION 



23 



EP 0 691 789 A2 







































































1 

1 




3 




























44 




4 




























43 




5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 








42 








45 








52 








16 








41 








46 








53 








17 








40 








47 








54 








18 








39 


61 


62 


63 


48 


64 


65 


66 


55 


67 


68 


69 


19 








38 








49 








56 








20 








37 








50 








57 


58 


59 




21 








36 








51 












60 




22 








35 


34 


33 


32 


31 


30 


29 


28 


27 


26 


25 


24 


23 





































































































FIG. 11 A 



24 



EP 0 691 789 A2 















































































































































7 








11 
























45 








52 
























46 








53 
























47 








54 
















39 


61 


62 


63 


48 


64 


65 


66 


55 


67 


68 


69 


19 
















49 








56 
























50 








57 


58 


59 




















51 












60 




















31 












25 









































































































FIG. 11 B 



25 



EP 0 691 789 A2 







































































































































































































































































39 


61 


62 


63 


48 


64 


65 


66 


55 


67 


68 


69 


19 





































































































































































































































FIG. lie 



26 



EP 0 691 789 A2 





I 

I 



± 




27 



EP 0 691 789 A2 




FRAME 



FULL SEARCH USING H.261 SYNTAX. 8X8 REGIONS 

MOTION ESTIMATION OF PRESENT INVENTION. 8X8 REGIONS 

MOTION ESTIMATION OF PRESENT INVENTION, 4X4 REGIONS 

FIG. 13 




28 



